Generalizing Subcategorization Frames Acquired from Corpora Using Lexicalized Grammars
نویسندگان
چکیده
This paper presents a method of improving the quality of subcategorization frames (SCFs) acquired from corpora in order to augment a lexicon of a lexicalized grammar. We first estimate a confidence value that a word can have each SCF, and create an SCF confidence-value vector for each word. Since the SCF confidence vectors obtained from the lexicon of the target grammar involve co-occurrence tendency among SCFs for words, we can improve the quality of the acquired SCFs by clustering vectors obtained from the acquired SCF lexicon and the lexicon of the target grammar. We apply our method to SCFs acquired from corpora by using a subset of the SCF lexicon of the XTAG English grammar. A comparison between the resulting SCF lexicon and the rest of the lexicon of the XTAG English grammar reveals that we can achieve higher precision and recall compared to naive frequency cut-off.
منابع مشابه
Improving the Accuracy of Subcategorizations Acquired from Corpora
This paper presents a method of improving the accuracy of subcategorization frames (SCFs) acquired from corpora to augment existing lexicon resources. I estimate a confidence value of each SCF using corpus-based statistics, and then perform clustering of SCF confidencevalue vectors for words to capture cooccurrence tendency among SCFs in the lexicon. I apply my method to SCFs acquired from corp...
متن کاملEnhancing FreeLing Rule-Based Dependency Grammars with Subcategorization Frames
Despite the recent advances in parsing, significant efforts are needed to improve the current parsers performance, such as the enhancement of the argument/adjunct recognition. There is evidence that verb subcategorization frames can contribute to parser accuracy, but a number of issues remain open. The main aim of this paper is to show how subcategorization frames acquired from a syntactically ...
متن کاملTowards Accurate Probabilistic Lexicons for Lexicalized Grammars
This paper proposes a method of constructing an accurate probabilistic subcategorization (SCF) lexicon for a lexicalized grammar extracted from a treebank. We employ a latent variable model to smooth co-occurrence probabilities between verbs and SCF types in the extracted lexicalized grammar. We applied our method to a verb SCF lexicon of an HPSG grammar acquired from the Penn Treebank. Experim...
متن کاملA Subcategorization Frames Acquisition System for French Verbs
This paper presents a system intended to automatically acquire subcategorization frames (SCFs) of verbs from the analysis of large corpora. The system has been applied to a newspaper corpus (made of 10 years of the French newspaper Le Monde) and acquired subcategorization information for 3267 verbs. 286 SCFs were dynamically learnt for these verbs. From the analysis of 25 representative verbs, ...
متن کاملCombining Bayesian and Support Vector Machines Learning to automatically complete Syntactical Information for HPSG-like Formalisms
Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We have made use of minimal linguistic resources, such as basic morphological tagging and phrase chunking, to demonstrate that verb subcategorizati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004